You can find the source code for this project at: https://github.com/hdickie/Mini-Challenge-1
The VAST Challenge was designed to test the skills of visual analytics researchers and developers. The VAST Challenge was composed of three mini challenges; my colleague Jean-Carlos Paredes and I worked together on the first. The main text of the first challenge was as follows:
“As part of his investigation, ornithology student Mitch Vogel needs to examine the movement of traffic through the Boonsong Lekagul Nature Preserve. His first working hypothesis is that there is some link between the traffic going through the preserve and the decline in the nesting Rose-crested Blue Pipit-maybe the traffic noises are drowning out mating calls! Or perhaps he can discover some odd goings on in the traffic patterns-perhaps campers are invading the bird’s habitat areas?”"
Our understanding was: expect to find something weird, and relate this to birds.
The data set had 171,477 rows of 4 columns: Timestamp, Vehicle Type, Vehicle ID, and Location (one of the labeled points on the map). The data were collected over a period of 13 months: 5/1/2015 - 5/31/2016
Data Snippet
2015-05-01 01:12:42,20151201011242-330,5,entrance0
2015-05-01 01:14:22,20151201011242-330,5,general-gate1
2015-05-01 01:17:13,20151201011242-330,5,ranger-stop2
| Vehicle Type | Description |
|---|---|
| 1 | 2 axle car (or motorcycle) |
| 2 | 2 axle truck |
| 2P | 2 axle truck (Park Vehicle) |
| 3 | 3 axle truck |
| 4 | 4 axle (and above) truck |
| 5 | 2 axle bus |
| 6 | 3 axle bus |
A map of the fictional park was provided for context.
We wrote code in python to use timestamps generate paths through the park of the following form: {Node, time-delta, Node, time-delta, … , Node}.
Then, using these paths, we computed the population of each of the campsites at the time the vehicle entered the park (do new campers avoid high population campsites?). This was an array of 9 numbers, one integer for each campsite.
To ensure the validity of our extracted features, we modeled the park as network, and created population-over-time plots to catch any obvious errors.
What if bird populations are plummeting because someone has taken a fancy to off-road shenanigans? The park was small enough that we could draw out the connections by hand.
It looks like a cute bug to me! It’s “body” and “eyes” are completely connected subgraphs with their nodes listed above. Other nodes are labeled.
We wrote this out as an adjacency matrix in excel, and then used python code to check that paths never skipped past nodes in ways they shouldn’t. We found no evidence of off-road behavior.
The only way it occured to us to validate campsite-population-over-time plots were these three criteria:
All Park Activity Over Time
Inidividual Campsite Populations Over Time (Campsite 0 - 9, left to right & top to bottom)
These plots don’t lead us to reject our extracted feature.
They also show us two things: Campsite #1 was extremely unpopular relative to the others, and also there does not seem to be a maximum number of people that can stay at a campsite at one time.
Campsite #5 reached a record population count at 74. If there was some kind of wait-list phenomenon, we would expect to see it sustain that value. Even if people don’t wait around for open spot, if there were a limited number of opens spots we should expect to see that maximum value reached many times (or at least more than once).
From these data, we cannot conclude whether or not these campsites had maximum capacities. Furthermore, even if there are, they do not seem to have been reached during the period these data were recorded.
Using the type of vehicle and the paths, we were able to identify a few types of behavior:
| Class Name | Number Observed | Percent of Observations |
|---|---|---|
| Thru Traffic | 8008 | 42.83% |
| Campers | 6513 | 34.84% |
| Park Rangers | 998 | 5.33% |
| Mystery Car | 1 | 00.00% |
| Unclassified | 3176 | 16.98% |
| Total | 18696 | 100.00% |
A little more than 40% of all cars moved through the park without significant stops. (The longest any Traffic vehicle spent between gates was 26 minutes).
A typical traffic route is shown below. Each route is shown on additional plots here.
About 35% of paths went to a campsite, stayed there for a significant amount of time (we arbitrarily chose 12 hours), and then left the park. In this case, the vehicle left along the same path it came, but this was not always the case.
Park rangers were identified by their vehicle type, and frequently took a thorough tour of the camp along a patrol-like route. They frequently originated and returned to the ranger base, but this was not always the case.
One vehicle stood out sharply against the others. Car ID 20155705025759-63 entered the park on June 5, 2016 around 3pm (about a month after data collection began) and remained in the park for at least 360 days- when data collection stopped. The car moved between campsites and would stay at each for about 30 days before moving to the next. The car did not visit campsites 1, 7, or 8, but visited all the others at least twice. Whatever this car is doing, it is certainly anomalous. If I could advise the park rangers within a month of when they stopped data collection, I would tell them to investigate campsite #5 where the vehicle was last seen!